This section includes an introduction to the project motivation, data, and research question. Include a data dictionary
In the last 30 years, the dating approach has changed and has become increasingly difficult. The willingness to date has decreased, dating is too expensive and time consuming, we have too many (perceived) options to date someone and we struggle because of accepting too easily negative sex stereotypes. In the 19th century, a custom in the United States called New Year’s Calling, was that on New Year's Day many young, single women would hold an Open House (a party or reception during which a person's home is open to visitors) on 1 January where they would invite eligible bachelors, both friends and strangers, to stop by for a brief (no more than 10–15-minute) visit. This custom was established with the term SpeedDating as a registered trademark by Aish HaTorah, who began hosting SpeedDating events in 1998.
10 years later, Fisman et al. conducted a survey regarding speed dating habits and collected 8,000 observations during his 2 – year observation in his paper Gender Differences in Mate Selection: Evidence from a Speed Dating Experiment. Because speed dating has become more and more interesting in the last few years and also through Corona a completely new dating approach has emerged, we want to discuss contexts in speed dating. With the data from this survey, we want to answer the following research questions:
To answer our research question, we defined the following sub-questions to strengthen our main research question:
The following hypotheses support our research question:
Null hypothesis:
Hypotheses:
| Name | Description | Role | Type | Format |
|---|---|---|---|---|
| iid | Unique subject number (wave + id + gender) | ID | numeric | int |
| id | Subject number within wave | ID | numeric | int |
| gender | Gender of the person. Female = 0, Male = 1 | predictor | nominal | category |
| idg | Subject number within gender (id + gender) | ID | numeric | int |
| condtn | Condition of the wave, 1 = Limited choice, 2 = extensive choice | predictor | nominal | category |
| wave | ID of the event | ID | numeric | int |
| round | Number of people that met in wave | predictor | numeric | int |
| position | Station number where met partner | predictor | numeric | int |
| positin1 | Station number where started | predictor | numeric | int |
| order | The number of date that night when met partner | predictor | numeric | int |
| partner | Partner's ID number the night of event | ID | numeric | int |
| pid | Partner's IID number | ID | numeric | int |
| match | 1 = yes, 0 = no | response | nominal | category |
| int_corr | Correlation between participant's and partner's ratings of interests in Time 1 | predictor | numeric | float |
| samerace | Participant and the partner were the same race. 1 = yes, 0 = no | predictor | nominal | category |
| age_o | Age of partner | predictor | nominal | category |
| race_o | Race of partner | predictor | nominal | category |
| pf_o_att | Partner's stated preference at Time 1. The sum of all pf_o_ elements must be 100. | predictor | numeric | float |
| pf_o_sin | Partner's stated preference at Time 1. The sum of all pf_o_ elements must be 100. | predictor | numeric | float |
| pf_o_int | Partner's stated preference at Time 1. The sum of all pf_o_ elements must be 100. | predictor | numeric | float |
| pf_o_fun | Partner's stated preference at Time 1. The sum of all pf_o_ elements must be 100. | predictor | numeric | float |
| pf_o_amb | Partner's stated preference at Time 1. The sum of all pf_o_ elements must be 100. | predictor | numeric | float |
| pf_o_sha | Partner's stated preference at Time 1. The sum of all pf_o_ elements must be 100. | predictor | numeric | float |
| dec_o | Decision of partner the night of event | predictor | nominal | category |
| attr_o | Attractive. Rating by partner the night of the event from 1 (awful) to 10 (great) | predictor | numeric | int |
| sinc_o | Sincere. Rating by partner the night of the event from 1 (awful) to 10 (great) | predictor | numeric | int |
| intel_o | Intelligent. Rating by partner the night of the event from 1 (awful) to 10 (great) | predictor | numeric | int |
| fun_o | Fun. Rating by partner the night of the event from 1 (awful) to 10 (great) | predictor | numeric | int |
| amb_o | Ambitious. Rating by partner the night of the event from 1 (awful) to 10 (great) | predictor | numeric | int |
| shar_o | Shared Interests/Hobbies. Rating by partner the night of the event from 1 (awful) to 10 (great) | predictor | numeric | int |
| like_o | Overall, how much do oyu like this person. 1 (don't like at all) to 10 (like a lot) | predictor | numeric | int |
| prob_o | How probable do you think it is that this person will say 'yes' for you? 1 (not probable) to 10 (extemely probable) | predictor | numeric | int |
| met_o | Have you met this person before? (1 = yes, 2 = no) | predictor | ordinal | category |
| Name | Description | Role | Type | Format |
|---|---|---|---|---|
| age | Age of the person | predictor | numeric | int |
| field | Field of study | predictor | nominal | string |
| field_cd | Field of study coded. 1= Law 2= Math 3= Social Science, Psychologist 4= Medical Science, Pharmaceuticals, and Bio Tech 5= Engineering 6= English/Creative Writing/ Journalism 7= History/Religion/Philosophy 8= Business/Econ/Finance 9= Education, Academia 10= Biological Sciences/Chemistry/Physics 11= Social Work 12= Undergrad/undecided 13=Political Science/International Affairs 14=Film 15=Fine Arts/Arts Administration 16=Languages 17=Architecture 18=Other |
predictor | nominal | category |
| mn_sat | Median SAT score for the undergraduate institution where attended. Proxy for intelligence. | |||
| tuition | Tuition listed for each response to undergrad | |||
| race | Race of the attendee 1 = Black/African American 2 = European/Caucasian-American 3 = Latino/Hispanic American 4 = Asian/Pacific Islander/Asian-American 5 = Native American 6 = Other |
predictor | nominal | category |
| imprace | How important is it that a person you date be of the same racial/ethic background? (1 - 10) | predictor | numeric | int |
| imprelig | How important is it that a person you date be of the same religious background? (1 - 10) | predictor | numeric | int |
| from | Where the person is originally from | predictor | nominal | string |
| zipcode | Zip code of the grow up area | predictor | nominal | category |
| income | Median household income based on zipcode | predictor | numeric | float |
| goal | What is the goal in participating in this event? 1 = Seemed like a fun night out 2 = To meet new people 3 = To get a date 4 = Looking for a serious relationship 5 = To say I did it 6 = Other |
predictor | nominal | category |
| date | How frequently do you go on dates? 1 = Several times a week 2 = Twice a week 3 = Once a week 4 = Twice a month 5 = Once a month 6 = Several times a year 7 = Almost never |
predictor | ordinal | category |
| go out | How often do you go out (not necessarily on dates)? 1 = Several times a week 2 = Twice a week 3 = Once a week 4 = Twice a month 5 = Once a month 6 = Several times a year 7 = Almost never |
predictor | ordinal | category |
| career | What is your intended career? | predictor | nominal | string |
| career_c | Career coded. 1 = Lawyer 2 = Academic/Research 3 = Psychologist 4 = Doctor/Medicine 5 =Engineer 6 = Creative Arts/Entertainment 7 = Banking/Consulting/Finance/Marketing/Business/CEO/Entrepreneur/Admin 8 = Real Estate 9 = International/Humanitarian Affairs 10 = Undecided 11 = Social Work 12 = Speech Pathology 13 = Politics 14 = Pro sports/Athletics 15 = Other 16 = Journalism 17 = Architecture |
predictor | nominal | category |
| sports | Playing sports/athletics. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| tvsports | Watching sports. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| excersice | Body building/exercising. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| dining | Dining out. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| museums | Museums/galleries. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| art | Art. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| hiking | Hiking/camping. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| gaming | Gaming. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| clubbing | Dancing/clubbing. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| reading | Reading. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| tv | Watching TV. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| theater | Theater. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| movies | Movies. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| concerts | Going to concerts. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| music | Music. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| shopping | Shopping. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| yoga | Yoga/meditation. Interest in this Hobby from 1 - 10. | predictor | numeric | int |
| exhappy | Overall, how happy do you expect to be with the people you meet during the event? (1 - 10) | predictor | numeric | int |
| expnum | Out of 20 people, how many do you expect will be interested in dating you? | predictor | numeric | int |
| attr1_1 | What do you (personally) look for in the opposite sex. The sum of all attr1_1 elements must be 100. | predictor | numeric | float |
| sinc1_1 | What do you (personally) look for in the opposite sex. The sum of all attr1_1 elements must be 100. | predictor | numeric | float |
| intel1_1 | What do you (personally) look for in the opposite sex. The sum of all attr1_1 elements must be 100. | predictor | numeric | float |
| fun1_1 | What do you (personally) look for in the opposite sex. The sum of all attr1_1 elements must be 100. | predictor | numeric | float |
| amb1_1 | What do you (personally) look for in the opposite sex. The sum of all attr1_1 elements must be 100. | predictor | numeric | float |
| shar1_1 | What do you (personally) look for in the opposite sex. The sum of all attr1_1 elements must be 100. | predictor | numeric | float |
| attr4_1 | What do you think your fellow men/woman look for in the opposite sex. The sum of all attr4_1 elements must be 100. | predictor | numeric | float |
| sinc4_1 | What do you think your fellow men/woman look for in the opposite sex. The sum of all attr4_1 elements must be 100. | predictor | numeric | float |
| intel4_1 | What do you think your fellow men/woman look for in the opposite sex. The sum of all attr4_1 elements must be 100. | predictor | numeric | float |
| fun4_1 | What do you think your fellow men/woman look for in the opposite sex. The sum of all attr4_1 elements must be 100. | predictor | numeric | float |
| amb4_1 | What do you think your fellow men/woman look for in the opposite sex. The sum of all attr4_1 elements must be 100. | predictor | numeric | float |
| shar4_1 | What do you think your fellow men/woman look for in the opposite sex. The sum of all attr4_1 elements must be 100. | predictor | numeric | float |
| attr2_1 | What do you think the opposite sex looks for in a date. The sum of all attr2_1 elements must be 100. | predictor | numeric | float |
| sinc2_1 | What do you think the opposite sex looks for in a date. The sum of all attr2_1 elements must be 100. | predictor | numeric | float |
| intel2_1 | What do you think the opposite sex looks for in a date. The sum of all attr2_1 elements must be 100. | predictor | numeric | float |
| fun2_1 | What do you think the opposite sex looks for in a date. The sum of all attr2_1 elements must be 100. | predictor | numeric | float |
| amb2_1 | What do you think the opposite sex looks for in a date. The sum of all attr2_1 elements must be 100. | predictor | numeric | float |
| shar2_1 | What do you think the opposite sex looks for in a date. The sum of all attr2_1 elements must be 100. | predictor | numeric | float |
| attr3_1 | Rate yourself from 1 - 10. | predictor | numeric | int |
| sinc3_1 | Rate yourself from 1 - 10. | predictor | numeric | int |
| intel3_1 | Rate yourself from 1 - 10. | predictor | numeric | int |
| fun3_1 | Rate yourself from 1 - 10. | predictor | numeric | int |
| amb3_1 | Rate yourself from 1 - 10. | predictor | numeric | int |
| shar3_1 | Rate yourself from 1 - 10. | predictor | numeric | int |
| attr5_1 | How do you think others perceive you? 1 = awful, 10 = great | predictor | numeric | int |
| sinc5_1 | How do you think others perceive you? 1 = awful, 10 = great | predictor | numeric | int |
| intel5_1 | How do you think others perceive you? 1 = awful, 10 = great | predictor | numeric | int |
| fun5_1 | How do you think others perceive you? 1 = awful, 10 = great | predictor | numeric | int |
| amb5_1 | How do you think others perceive you? 1 = awful, 10 = great | predictor | numeric | int |
| shar5_1 | How do you think others perceive you? 1 = awful, 10 = great | predictor | numeric | int |
Careful: For all attributes _1, _2 and *_4, wave 6-9 rated the importance of the attributes in a potential date on a scale of 1-10 (1=not at all important, 10=extremely important).
Waves 1-5 and 10-21 distribued 100 points among the attributes. Total points must equal 100.
| Name | Description | Role | Type | Format |
|---|---|---|---|---|
| dec | Decision if you want to see the person again (1) or not (0) | predictor | nominal | category |
| attr | Rating of the attribute for this person from 1 - 10. | predictor | numeric | int |
| sinc | Rating of the attribute for this person from 1 - 10. | predictor | numeric | int |
| intel | Rating of the attribute for this person from 1 - 10. | predictor | numeric | int |
| fun | Rating of the attribute for this person from 1 - 10. | predictor | numeric | int |
| amb | Rating of the attribute for this person from 1 - 10. | predictor | numeric | int |
| shar | Rating of the attribute for this person from 1 - 10. | predictor | numeric | int |
| like | Overall, how much do oyu like this person. 1 (don't like at all) to 10 (like a lot) | predictor | numeric | int |
| prob | How probable do you think it is that this person will say 'yes' for you? 1 (not probable) to 10 (extemely probable) | predictor | numeric | int |
| met | Have you met this person before? (1 = yes, 2 = no) | predictor | ordinal | category |
| Name | Description | Role | Type | Format |
|---|---|---|---|---|
| attr1_s | What do you (personally) look for in the opposite sex. 1 - 10 rating. | predictor | numeric | int |
| sinc1_s | What do you (personally) look for in the opposite sex. 1 - 10 rating. | predictor | numeric | int |
| intel1_s | What do you (personally) look for in the opposite sex. 1 - 10 rating. | predictor | numeric | int |
| fun1_s | What do you (personally) look for in the opposite sex. 1 - 10 rating. | predictor | numeric | int |
| amb1_s | What do you (personally) look for in the opposite sex. 1 - 10 rating. | predictor | numeric | int |
| shar1_s | What do you (personally) look for in the opposite sex. 1 - 10 rating. | predictor | numeric | int |
| attr4_s | Rate yourself from 1 - 10 | predictor | numeric | int |
| sinc4_s | Rate yourself from 1 - 10 | predictor | numeric | int |
| intel4_s | Rate yourself from 1 - 10 | predictor | numeric | int |
| fun4_s | Rate yourself from 1 - 10 | predictor | numeric | int |
| amb4_s | Rate yourself from 1 - 10 | predictor | numeric | int |
| Name | Description | Role | Type | Format |
|---|---|---|---|---|
| satis_2 | Overall, how satisfied were you with the people you met? (1=not at all satisfied, 10=extremely satisfied) | predictor | numeric | int |
| length | Four minutes is: 1 = Too little, 2 = Too much 3 = Just Right |
predictor | nominal | category |
| numdat_2 | The number of Speed "Dates" you had was: 1 = Too few, 2 = Too many, 3 = Just right |
predictor | nominal | category |
... and again the same questions regarding attributes
| Name | Description | Role | Type | Format |
|---|---|---|---|---|
| you_call | How many have you contacted to set up a date? | predictor | numeric | int |
| them_cal | How many have contacted you? | predictor | numeric | int |
| date_3 | Have you been on a date with any of your matches? Yes=1 No=2 | predictor | nominal | category |
| numdat_3 | If yes, how many of your matches have you been on a date with so far? | predictor | numeric | int |
| num_in_3 | If yes, how many? | predictor | numeric | int |
... and again the same questions regarding attributes
Role: response, predictor, ID (ID columns are not used in a model but can help to better understand the data)
Type: nominal, ordinal or numeric
Format: int, float, string, category, date or object
| Name | Description | Descriptive term |
|---|---|---|
| calls | Event of a participant conducting a "you_call" or "them_cal" with the other party | Calls of participants |
| attr | Rating of the attribute for this person from 1 - 10. | Attractivity of speed dating participant |
| sinc | Rating of the attribute for this person from 1 - 10. | Sincerety of speed dating participant |
| intel | Rating of the attribute for this person from 1 - 10. | Intelligence of speed dating participant |
| fun | Rating of the attribute for this person from 1 - 10. | Humor of speed dating participant |
| amb | Rating of the attribute for this person from 1 - 10. | Ambition of speed dating participant |
| shar | Rating of the attribute for this person from 1 - 10. | Shared Interests/Hobbies of the speed dating participant to the other party |
| like | Overall, how much do oyu like this person. 1 (don't like at all) to 10 (like a lot) | Strength of like of speed dating participant to the other party |
| prob | How probable do you think it is that this person will say 'yes' for you? 1 (not probable) to 10 (extemely probable) | Probability of speed dating participant to like the other party |
| met | Have you met this person before? (1 = yes, 2 = no) | Meeting indicator of participants |
| gender | Gender of the person. Female = 0, Male = 1 | Gender of speed dating participant |
| order | The number of date that night when met partner | Order of date of speed dating participant and the other party during event |
| match | 1 = yes, 0 = no | Match of the speed dating participant and the other party |
| int_corr | Correlation between participant's and partner's ratings of interests in Time 1 | Correlation of the speed dating participant and the other party |
| samerace | Participant and the partner were the same race. 1 = yes, 0 = no | Indicates, if the speed dating participant and the other party have the same race |
| age | Age of the person | Age of speed dating participant |
| age_o | Age of partner | Age of other party |
| race | Race of the attendee 1 = Black/African American 2 = European/Caucasian-American 3 = Latino/Hispanic American 4 = Asian/Pacific Islander/Asian-American 5 = Native American 6 = Other |
Race of speed dating participant |
| race_o | Race of partner | Race of other party |
| imprace | How important is it that a person you date be of the same racial/ethic background? (1 - 10) | Importance of the other party having the same race as the speed dating participant |
| intel_o | Intelligent. Rating by partner the night of the event from 1 (awful) to 10 (great) | Intelligence of the other party |
| sinc_o | Sincere. Rating by partner the night of the event from 1 (awful) to 10 (great) | Sincerety of the other party |
| like_o | Overall, how much do oyu like this person. 1 (don't like at all) to 10 (like a lot) | Strength of like of to the other party |
| prob_o | How probable do you think it is that this person will say 'yes' for you? 1 (not probable) to 10 (extemely probable) | Probability of the other party to like speed dating participant |
| fun_o | Fun. Rating by partner the night of the event from 1 (awful) to 10 (great) | Humor of the other party |
| satis_2 | Generic Id | Generic Id |
| amb_o | Ambitious. Rating by partner the night of the event from 1 (awful) to 10 (great) | Ambition of the other party |
| shar_o | Shared Interests/Hobbies. Rating by partner the night of the event from 1 (awful) to 10 (great) | Shared Interests/Hobbies of the other party to speed dating participant |
| attr_o | Attractive. Rating by partner the night of the event from 1 (awful) to 10 (great) | Attractivity of the other party |
| met_o | Have you met this person before? (1 = yes, 2 = no) | Meeting indicator of the other party |
| exphappy | Overall, on a scale of 1-10, how happy do you expect to be with the people you meet during the speed-dating event? | Expected Happiness of meeting people |
| pid | partner's iid number | partner's iid number |
In this section we will import important libraries and funtions in order to create our data frames, calculations and visualizations
%matplotlib inline
import pickle
import pandas as pd
import altair as alt
import numpy as np
import seaborn as sns
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import ConfusionMatrixDisplay, classification_report, RocCurveDisplay, roc_auc_score, make_scorer, precision_recall_curve, PrecisionRecallDisplay
import matplotlib.pyplot as plt
alt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')
Data was taken from here: https://perso.telecom-paristech.fr/eagan/class/igr204/datasets/SpeedDating.csv
We create our data frame out of the imported csv data from our data source
df = pd.read_csv("../data/interim/TransformedData",delimiter=",", index_col=0)
We have 195 attributes in this dataset, this is a lot. We already see some NaN values which we'll have to eliminate later.
df.head()
| iid | id | gender | idg | condtn | wave | round | position | positin1 | order | ... | attr3_3 | sinc3_3 | intel3_3 | fun3_3 | amb3_3 | attr5_3 | sinc5_3 | intel5_3 | fun5_3 | amb5_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1.0 | 0 | 1 | 1 | 1 | 10 | 7 | NaN | 4 | ... | 5.0 | 7.0 | 7.0 | 7.0 | 7.0 | NaN | NaN | NaN | NaN | NaN |
| 1 | 1 | 1.0 | 0 | 1 | 1 | 1 | 10 | 7 | NaN | 3 | ... | 5.0 | 7.0 | 7.0 | 7.0 | 7.0 | NaN | NaN | NaN | NaN | NaN |
| 2 | 1 | 1.0 | 0 | 1 | 1 | 1 | 10 | 7 | NaN | 10 | ... | 5.0 | 7.0 | 7.0 | 7.0 | 7.0 | NaN | NaN | NaN | NaN | NaN |
| 3 | 1 | 1.0 | 0 | 1 | 1 | 1 | 10 | 7 | NaN | 5 | ... | 5.0 | 7.0 | 7.0 | 7.0 | 7.0 | NaN | NaN | NaN | NaN | NaN |
| 4 | 1 | 1.0 | 0 | 1 | 1 | 1 | 10 | 7 | NaN | 7 | ... | 5.0 | 7.0 | 7.0 | 7.0 | 7.0 | NaN | NaN | NaN | NaN | NaN |
5 rows × 195 columns
Currently there are 175 float values, 13 int values and 7 object values. We already identified some values that should rather be categorical.
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 8378 entries, 0 to 8377 Columns: 195 entries, iid to amb5_3 dtypes: float64(175), int64(13), object(7) memory usage: 12.5+ MB
In fact we created lists for each data type so we can map the attributes easier.
cat_vars = [
"gender",
"condtn",
"match",
"samerace",
"race_o",
"dec_o",
"met_o",
"field_cd",
"race",
"zipcode",
"goal",
"date",
"go_out",
"career_c",
"dec",
"met",
"length",
"numdat_2",
"date_3",
]
float_vars = [
"int_corr",
"pf_o_att",
"pf_o_sin",
"pf_o_int",
"pf_o_fun",
"pf_o_amb",
"pf_o_sha",
"income",
"attr1_1",
"sinc1_1",
"intel1_1",
"fun1_1",
"amb1_1",
"shar1_1",
"attr4_1",
"sinc4_1",
"intel4_1",
"fun4_1",
"amb4_1",
"shar4_1",
"attr2_1",
"sinc2_1",
"intel2_1",
"fun2_1",
"amb2_1",
"shar2_1"
]
int_vars = [
"attr_o",
"sinc_o",
"intel_o",
"fun_o",
"amb_o",
"shar_o",
"like_o",
"prob_o",
"age",
"age_o",
"imprace",
"imprelig",
"sports",
"tvsports",
"excersice",
"dining",
"museums",
"art",
"hiking",
"gaming",
"clubbing",
"reading",
"tv",
"theater",
"movies",
"concerts",
"music",
"shopping",
"yoga",
"exhappy",
"attr3_1",
"sinc3_1",
"intel3_1",
"fun3_1",
"amb3_1",
"attr5_1",
"sinc5_1",
"intel5_1",
"fun5_1",
"amb5_1",
"attr",
"sinc",
"intel",
"fun",
"amb",
"shar",
"like",
"prob",
"attr1_s",
"sinc1_s",
"intel1_s",
"fun1_s",
"amb1_s",
"shar1_s",
"attr4_s",
"sinc4_s",
"intel4_s",
"fun4_s",
"amb4_s",
"satis_2",
"iid",
"id",
"idg",
"wave",
"round",
"order",
"partner",
"pid",
"expnum",
"you_call",
"them_cal",
"numdat_3",
"num_in_3",
"position",
"positin1",
]
str_vars = [
"field",
"from",
"career"
]
unused_vars = [
"undergrd",
"mn_sat",
"tuition"
]
Thanks to these lists we can just map all attributes in one go.
df[cat_vars]=df[cat_vars].astype("category",copy=False)
df[float_vars]=df[float_vars].astype("float",copy=False)
df[str_vars]=df[str_vars].astype("str",copy=False)
Since we assume that a call from a participant "you_call" or "them_cal" with the other party excludes a call from the other party, we sum these two variables at this point.
df['calls'] = df['you_call'] + df['them_cal']
We already see that a lot of participants didn't call at all (1.284), but on the other hand the majority (2.690) was called or called at least one person.
df['calls'].value_counts()
0.0 1284 1.0 935 2.0 848 4.0 275 3.0 254 5.0 196 6.0 115 9.0 21 14.0 18 10.0 18 22.0 10 Name: calls, dtype: int64
We can see that a lot more male (2.422) are calling female than the other way round (681).
On the other hand, both sexes said that they have been called more often than there were actual calls (male 1.035/681 and female 2.866/2.422), maybe there is some bias about these numbers or the data is incomplete.
alt.Chart(df).mark_bar().encode(
alt.X('gender', title='female,male'),
y='sum(you_call)',
color='gender',
tooltip='sum(you_call)'
) | alt.Chart(df).mark_bar().encode(
alt.X('gender', title='female,male'),
y='sum(them_cal)',
color='gender',
tooltip='sum(them_cal)'
).properties(
title='Distribution of calls per gender'
)
c:\Users\Sscho\anaconda3\envs\stats\lib\site-packages\altair\utils\core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for col_name, dtype in df.dtypes.iteritems():